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While estimation of the marginal (total) causal effect of a point 
exposure on an outcome is arguably the most common objective of ex- 
perimental and observational studies in the health and social sciences, 
in recent years, investigators have also become increasingly interested 
in mediation analysis. Specifically, upon evaluating the total effect of 
the exposure, investigators routinely wish to make inferences about 
the direct or indirect pathways of the effect of the exposure, through 
a mediator variable or not, that occurs subsequently to the expo- 
sure and prior to the outcome. Although powerful semiparametric 
methodologies have been developed to analyze observational stud- 
ies that produce double robust and highly efficient estimates of the 
marginal total causal effect, similar methods for mediation analysis 
are currently lacking. Thus, this paper develops a general semipara- 
metric framework for obtaining inferences about so-called marginal 
natural direct and indirect causal effects, while appropriately ac- 
counting for a large number of pre-exposure confounding factors for 
the exposure and the mediator variables. Our analytic framework 
is particularly appealing, because it gives new insights on issues of 
efficiency and robustness in the context of mediation analysis. In par- 
ticular, we propose new multiply robust locally efficient estimators of 
the marginal natural indirect and direct causal effects, and develop 
a novel double robust sensitivity analysis framework for the assump- 
tion of ignorability of the mediator variable. 

1. Introduction. The evaluation of the total causal effect of a given point 
exposure, treatment or intervention on an outcome of interest is arguably 
the most common objective of experimental and observational studies in the 
fields of epidemiology, biostatistics and in the social sciences. However, in 
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recent years, investigators in these various fields have become increasingly 
interested in making inferences about the direct or indirect pathways of the 
exposure effect, through a mediator variable or not, that occurs subsequently 
to the exposure and prior to the outcome. Recently, the counterfactual lan- 
guage of causal inference has proven particularly useful for formalizing medi- 
ation analysis. Indeed, causal inference offers a formal mathematical frame- 
work for defining varieties of direct and indirect effects, and for establishing 
necessary and sufficient identifying conditions of these effects. A notable 
contribution of causal inference to the literature on mediation analysis is 
the key distinction drawn between so-called controlled direct effects versus 
natural direct effects. In words, the controlled direct effect refers to the ex- 
posure effect that arises upon intervening to set the mediator to a fixed 
level that may differ from its actual observed value [Robins and Greenland 
(1992), Pearl (2001), Robins (2003)]. In contrast, the natural (also known as 
pure) direct effect captures the effect of the exposure when one intervenes 
to set the mediator to the (random) level it would have been in the absence 
of exposure [Robins and Greenland (1992), Pearl (2001)]. As noted by Pearl 
(2001), controlled direct and indirect effects are particularly relevant for pol- 
icy making, whereas natural direct and indirect effects are more useful for 
understanding the underlying mechanism by which the exposure operates. 
In fact, natural direct and indirect effects combine to produce the exposure 
total effect. 

To formally define natural direct and indirect effects first requires defining 
counterfactuals. We assume that for each level of a binary exposure E, and 
of a mediator variable M, there exist a counterfactual variable corre- 
sponding to the outcome Y had possibly contrary to fact the exposure and 
mediator variables taken the value (e,m). Similarly, for E = e, we assume 
there exists a counterfactual variable Mg corresponding to the mediator vari- 
able had possibly contrary to fact the exposure variable taken the value e. 
The current paper concerns the decomposition of the total effect of E on Y, 
in terms of natural direct and natural indirect effects, which, expressed on 
the mean difference scale, is given by 

total effect 

E{Ye=l-Ye=o) = ]E(i;=l,Afe.i - Ye=o.M^^o) 

(1) 

natural indirect effect natural direct effect 

, ^ . ^ 

= E(Y'e=i^Me=i — ^e=l,Me=o) +'^{Ye=l,Me^o ~ ^e=0,Me=o)) 

where E stands for expectation. 

In an effort to account for confounding bias when estimating causal ef- 
fects, such as the average total effect (1) from nonexperimental data, in- 
vestigators routinely collect and adjust for in data analysis, a large number 
of confounding factors. Because of the curse of dimensionality, nonparamet- 
ric methods of estimation are typically not practical in such settings, and 
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one usually resorts to one of two dimension-reduction strategies; either one 
relies on a model for the outcome given exposure and counfounders, or alter- 
nately one relies on a model for the exposure, that is, the propensity score. 
Recently, powerful semiparametric methods have been developed to ana- 
lyze observational studies that produce so-called double robust and highly 
efficient estimates of the exposure total causal effect [Robins (2000), Scharf- 
stein, Rotnitzky and Robins (1999), Bang and Robins (2005), Tsiatis (2006)] 
and similar methods have also been developed to estimate controlled direct 
effects [Goetgeluk, Vansteelandt and Goetghebeur (2008)]. An important 
advantage of a double robust method is that it carefully combines both of 
the aforementioned dimension reduction strategies for confounding adjust- 
ment, to produce an estimator of the causal effect that remains consistent 
and asymptotically normal, provided at least one of the two strategies is cor- 
rect, without necessarily knowing which strategy is indeed correct [van der 
Laan and Robins (2003)]. Unfortunately, similar methods for making semi- 
parametric inferences about marginal natural direct and indirect effects are 
currently lacking. Thus, this paper develops a general semiparametric frame- 
work for obtaining inferences about marginal natural direct and indirect ef- 
fects on the mean of an outcome, while appropriately accounting for a large 
number of confounding factors for the exposure and the mediator variables. 

Our semiparametric framework is particularly appealing, as it gives new 
insight on issues of efficiency and robustness in the context of mediation 
analysis. Specifically, in Section 2, we adopt the sequential ignorability as- 
sumption of Imai, Keele and Tingley (2010) under which, in conjunction with 
the standard consistency and positivity assumptions, we derive the efficient 
influence function and thus obtain the semiparametric efficiency bound for 
the natural direct and natural indirect marginal mean causal effects, in the 
nonparametric model TWnonpar in which the observed data likelihood is left 
unrestricted. We further show that in order to conduct mediation inferences 
in A^nonpar, One must estimate at least a subset of the following quantities: 

(i) the conditional expectation of the outcome given the mediator, ex- 
posure and confounding factors; 

(ii) the density of the mediator given the exposure and the confounders; 

(iii) the density of the exposure given the confounders. 

Ideally, to minimize the possibility of modeling bias, one may wish to 
estimate each of these quantities nonparametrically; however, as previously 
argued, when as we assume throughout, we wish to account for numerous 
confounders, such nonparametric estimates will likely perform poorly in fi- 
nite samples. Thus, in Section 2.3 we develop an alternative multiply robust 
strategy. To do so, we propose to model (i), (ii) and (iii) parametrically (or 
semiparametrically) , but rather than obtaining mediation inferences that 
rely on the correct specification of a specific subset of these models, in- 
stead we carefully combine these three models to produce estimators of 
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the marginal mean direct and indirect effects that remain consistent and 
asymptotically normal (CAN) in a union model, where at least one but not 
necessarily all of the following conditions hold: 

(a) the parametric or semi-parametric models for the conditional expec- 
tation of the outcome (i) and for the conditional density of the mediator (ii) 
are correctly specified; 

(b) the parametric or semiparametric models for the conditional expec- 
tation of the outcome (i) and for the conditional density of the exposure (iii) 
are correctly specified; 

(c) the parametric or semiparametric models for the conditional densities 
of the exposure and the mediator (ii) and (iii) are correctly specified. 

Accordingly, we define submodels TWa, J^b and Aic of 7W nonpar corre- 
sponding to models (a), (b) and (c) respectively. Thus, the proposed ap- 
proach is triply robust as it produces valid inferences about natural direct 
and indirect effects in the union model A^union = -Ma U A4b U Aic- Further- 
more, as we later show in Section 2.3, the proposed estimators are also 
locally semiparametric efficient in the sense that they achieve the respec- 
tive efficiency bounds for estimating the natural direct and indirect effects 
in union 1 at the intersection submodel TWa nA^f,n7Wc = -Ma HA^c = 

ManMb=MbnMcC 7W union C TWnonpar- 

Section 3 summarizes a simulation study illustrating the finite sample 
performance of the various estimators described in Section 2, and Section 4 
gives a real data application of these methods. Section 5 describes a strategy 
to improve the stability of the proposed multiply robust estimator which di- 
rectly depends on inverse exposure and mediator density weights, when such 
weights are highly variable, and Section 6 demonstrates the favorable per- 
formance of two modified multiply robust estimators in the context of such 
highly variable weights. In Section 7, we compare the proposed methodology 
to the prevailing estimators in the literature. Based on this comparison, we 
conclude that the new approach should generally be preferred because an 
inference under the proposed method is guaranteed to remain valid under 
many more data generating laws than an inference based on each of the other 
existing approaches. In particular, as we argue below the approach of van der 
Laan and Petersen (2005) is not entirely satisfactory because, despite pro- 
ducing a CAN estimator of the marginal direct effect under the union model 
A4a^-M.c (and therefore an estimator that is double robust), their estimator 
requires a correct model for the density of the mediator. Thus, unlike the 
direct effect estimator developed in this paper, the van der Laan estimator 
fails to be consistent under the submodel Aib C Afunion- Nonetheless, the 
estimator of van der Laan is in fact locally efficient in model Ma'J -Mc, pro- 
vided the model for the mediator's conditional density is either known, or 
can be efficiently estimated. This property is confirmed in a supplementary 
online Appendix [Tchetgen Tchetgen and Shpitser (2012)], where we also 
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provide a general map that relates the efficient influence function for model 
-A^union to the Corresponding efficient influence function for model A^aU A^o 
assuming an arbitrary parametric or semiparametric model for the mediator 
conditional density is correctly specified. In Section 8, we describe a novel 
double robust sensitivity analysis framework to assess the impact on infer- 
ences about the natural direct effect, of a departure from the ignor ability 
assumption of the mediator variable. We conclude with a brief discussion. 

2. The nonparametric mediation functional. 

2.1. Identification. Suppose i.i.d. data on O = (Y,E,M,X) is collected 
for n subjects. Recall that Y is an outcome of interest, S is a binary ex- 
posure variable, M is a mediator variable with support S, known to occur 
subsequently to E and prior to Y and X is a vector of pre-exposure variables 
with support X that confound the association between {E, M) and Y . The 
overarching goal of this paper is to provide some theory of inference about 
the fundamental functional of mediation analysis which Judea Pearl calls 
"the mediation causal formula" [Pearl (2011)] and which, expressed on the 
mean scale, is 

6*0 = j I E{Y\E=1,M = m,X = x) 
SxX 

(2) 

X fM\E,x{^\E = 0,X = x)fx{x)dfi{m,x), 

fM\E,x aiid fx are respectively the conditional density of the mediator M 
given {E,X) and the density of X, and ^ is a dominating measure for 
the distribution of {M,X). Hereafter, to keep with standard statistical par- 
lance, we shall simply refer to Oq as the "mediation functional" or "M- 
functional" since it is formally a functional on the nonparametric statistical 
model nonpar = {-^o(") ■ Fq Unrestricted} of all regular laws Fq of the ob- 
served data O that satisfy the positivity assumption given below; that is, 
So = Go{Fo) '■ Alnonpar " ^ T^, with TZ the real line. The functional is of keen 
interest here because it arises in the estimation of natural direct and indirect 
effects as we describe next. To do so, we make the consistency assumption. 
Consistency: 

if E = e, then Me = M w.p.l and 

\i E = e and M = m, then Yg^rn. = Y w.p.l. 

In addition, we adopt the sequential ignorability assumption of Imai, 
Keele and Tingley (2010) which states that for e,e' G {0, 1}. 
Sequential ignorability: 

{Ye',m,M,}ALE\X, 

Ye'm^M\E = e,X, 
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where A AL B\C states that A is independent of B given C; paired with the 
following: 
Positivity: 

fM\E,x{^\ET ^) > w.p.l for each m £S and 

fE\x{e\X)>0 w.p.l for each e G {0, 1}. 

Then, under the consistency, sequential ignorability and positivity as- 
sumptions, Imai, Keele and Tingley (2010) showed that 

eo = IE(ri,A/o) and 

6e = j ¥.{Y\E = e,X = x)fx{x) dn{x) 

X 

(3) = JjE{Y\E = e,M = m,X = x) 

SxX 

X fM\E,x{'m\E = e,X = x)fx{x)dfi{m,x) 
= E(ye)=E(Fe,Me), e = 0,l, 

so that E(yi_Mo) and E,(Ye), e = 0, 1, are identified from the observed data, 
and so is the mean natural direct effect E(Yi^Mo) ~ lE(yo) = 00 — ^0 and 
the mean natural indirect effect E(li) — E(Yi^Mo) = ~ ^o- For binary Y , 
one might alternatively consider the natural direct effect on the risk ra- 
tio scale E(yi Mo)/^(Xo) = ^o/^o or on the odds ratio scale {E(yi jv/o)lE(l — 
yo)}/{E(l - yi,Mo)E(yo)} = {e^il - 6o)}/{do{l - Oq)} and similarly defined 
natural indirect effects on the risk ratio and odds ratio scales. It is instruc- 
tive to contrast the expression (2) for E(yi^jv/o) with the expression (3) for 
e = 1 corresponding to E(yi), and to note that the two expressions bare 
a striking resemblance except the density of the mediator in the first ex- 
pression conditions on the unexposed (with E = 0), whereas in the second 
expression, the mediator density is conditional on the exposed (with E = 1). 
As we demonstrate below, this subtle difference has remarkable implications 
for inference. 

Pearl (2001) was the first to derive the M-functional = ^O^i,Mo) un- 
der a different set of assumptions. Others have since contributed alternative 
sets of identifying assumptions. In this paper, we have chosen to work under 
the sequential ignorability assumption of Imai, Keele and Yamamoto (2010), 
Imai, Keele and Tingley (2010), but note that alternative related assump- 
tions exist in the literature [Robins and Greenland (1992), Pearl (2001), 
van der Laan and Petersen (2005), Hafeman and Vanderweele (2011)]; how- 
ever, we note that Robins and Richardson (2012) disagree with the label 
"sequential ignorability" because its terminology has previously carried a dif- 
ferent interpretation in the literature. Nonetheless, the assumption entails 
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two ignorability-like assumptions that are made sequentially. First, given 
the observed pre-exposure confounders, the exposure assignment is assumed 
to be ignorable, that is, statistically independent of potential outcomes and 
potential mediators. The second part of the assumption states that the medi- 
ator is ignorable given the observed exposure and pre-exposure confounders. 
Specifically, the second part of the sequential ignorability assumption is con- 
ditional on the observed value of the ignorable treatment and the observed 
pretreatment confounders. We note that the second part of the sequential 
ignorability assumption is particularly strong and must be made with care. 
This is partly because it is always possible that there might be unobserved 
variables that confound the relationship between the outcome and the medi- 
ator variables, even upon conditioning on the observed exposure and covari- 
ates. Furthermore, the confounders X must all be pre-exposure variables; 
that is, they must precede E. In fact, Avin, Shpitser and Pearl (2005) proved 
that without additional assumptions, one cannot identify natural direct and 
indirect effects if there are confounding variables that are affected by the 
exposure, even if such variables are observed by the investigator [also see 
Tchetgen Tchetgen and VanderWeele (2012)]. This implies that, similarly 
to the ignorability of the exposure in observational studies, ignorability of the 
mediator cannot be established with certainty, even after collecting as many 
pre-exposure confounders as possible. Furthermore, as Robins and Richard- 
son (2012) point out, whereas the first part of the sequential ignorability 
assumption could, in principle, be enforced in a randomized study, by ran- 
domizing E within levels of X; the second part of the sequential ignorability 
assumption cannot similarly be enforced experimentally, even by randomiza- 
tion. And thus, for this latter assumption to hold, one must entirely rely on 
expert knowledge about the mechanism under study. For this reason, it will 
be crucial in practice to supplement mediation analyses with a sensitivity 
analysis that accurately quantifies the degree to which results are robust to 
a potential violation of the sequential ignorability assumption. Later in the 
paper, we develop a variety of sensitivity analysis techniques that allow the 
analyst to quantify the degree to which his or her mediation analysis results 
are robust to a potential violation of the sequential ignorability assumption. 

2.2. Semiparametric efficiency hounds for TWnonpar- In this section, we 
derive the efficient influence function for the M- functional hi nonpar- 
This result is then combined with the efficient influence function for the 
functional 5e [Robins, Rotnitzky and Zhao (1994), Hahn (1998)] to obtain 
the efficient influence function for the natural direct and indirect effects on 
the mean difference scale. Thus, in the following, we shall use the efficient 
influence function 8^^'^°^^^"^ {5e) of 8e which is well known to be 




{y-7?(e,e,X)} + r?(e,e,X) + 5, 



fE\x{e\X) 
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where for e, e* G {0, 1}, we define 

ry(e, e*, X) = ^E(y|X, M = m,E = e)fM\E,xHE = e*,X) dfi{m), 

so that r]{e,e,X) =E{Y\X,E = e), e = 0,l. 

The fohowing theorem is proved in the Appendix. 

Theorem 1. Under the consistency, sequential ignorability and positiv- 
ity assumptions, the efficient influence function of the M-functional 6q in 
model TWnonpar is givcn by 



^eff, nonpar / 



70 ; 



(O;^o) 

I{E = l}fM\E x{M\E = 0, X) 
\ 5JM\E,x\ I , >.{Y -E{Y\X,M,E = l)} 



fE\x{'^\X)fM\E,xmE = '^.X) 

+ /^^rnl M, g = 1) - r?(l, 0, X)} + r?(l, 0, X) - go, 

Je\xVj\X} 

and the efficient influence function of the natural direct and indirect effects 

on the TflGQjTl diffcTCTlCC SCdlc ifl model A^nonpar 

are respectively given by 

ryeff , nonpar //) r \ 
-^NDE {^Oi^O) 

^eff, nonpar ^ ^cfE, nonpar ^ 

^ I{E = l}fM\E,xmE = 0,X) 

fE\x{MX)fM\E,xmE=l,X) 



{Y-E{Y\X, M,E = 1)} 



+ 7^^7Hn^i^(^l^'^^'^ = 1) - ^ - r?(l,0,X) +r?(0,0,X)} 
je\x[^\^) 

+ r]il,0,X)-7]{0,0,X)-9o + 6o, 



and 



rrcff, nonpar / r r, \ 
•^NIE I'^ljf^Oj 

_ „eff,nonpar r /j 
I{E = 1 



frnxiMx) 



y-r?(l,l,X) 

/M|E,x(M|ij; = o,x) 

' fM\E,xiM\E = l,X) 



{Y -E{Y\X, M,E = l)} 
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^^-{E{Y\X,M,E = l)-r^{l,0,X)} 



fE\xm) 
+ rj{l,l,X)-ijil,0,X)+eo-6i. 

Thus, the semiparametric efficiency bound for estimating the natural di- 
rect and the natural indirect effects in A^nonpar are respectively given by 

IE{5^^DE™(^o,5o)n-' and £{^^'^—(51,^0)^-^ 

Although not presented here, Theorem 1 is easily extended to obtain 
the efficient influence functions and the respective semiparametric efficiency 
bounds for the direct and indirect eff'ects on the risk ratio and the odds 
ratio scales by a straightforward application of the delta method. An impor- 
tant implication of the theorem is that all regular and asymptotically linear 
(RAL) estimators of Oq, 5i — Oq and Oq — 60 in model nonpar share the com- 

■ n r i- ocff, nonpar /fl \ rYcff, nonpar i- \ i r<eff .nonpar / e n \ 

mon mnuence functions Dg^j (^oJi'-'nde I^Oj ooj and Dj^^j^ (01,^0)5 

respectively. Specifically, any RAL estimator 6q of the M-functional 9q in 
model nonpar; shares a common asymptotic expansion, 

n'^HOo - 0o) = n'/'¥r,S^^"'''{eo) + op{l), 

where Pn['] = '^"^ Xlil"]*- ^o illustrate this property of nonparametric RAL 
estimators, and as a motivation to multiply robust estimation when nonpara- 
metric methods are not appropriate, we provide a detailed study of three 
nonparametric strategies for estimating the M-functional in a simple yet 
instructive setting in which X and M are both discrete with finite support. 

Strategy 1: The first strategy entails obtaining the maximum likelihood 
estimator upon evaluating the M-functional under the empirical law of the 
observed data, 

e^"^ = FnY.E{Y\E = l,M = m,X)fMiE,xHE = 0,X), 

where fY\E,M,x and fM\E,x are the empirical probability mass functions, 

and E(y = e,M = m,X = x) is the expectation of Y under fY\E,M,x- 

Strategy 2: The second strategy is based on the following alternative rep- 
resentation of the M-functional: 

E{Y\E = 1,M = m,X = x) dFM\E{m\E = 0, X = x) dFx{x) 



11 E{Y\E = l,M = m,X = x)—^^^-^^dFM,E,x{m,e,x) 
f^nJ J fE\x[e\X = x) 



SxX 
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Thus, our second estimator takes the form 



gye 
^0 



fE\xiS^\X) 



with fE\x the empirical estimate of the probabihty mass function fE\x- 

Strategy 3: The last strategy is based on a third representation of the 
M-functional 

¥.{Y\E = 1,M = m,X = x) dFM\Eim\E = 0,X = x) dFx{x) 
1 



E 



I(e = l) fM\E,xmE = ^^^) , ^ 

y~t TJ^ \~f (T^A\T? Y\ dFY,M,E,x{y, m, e, x) 

fE\x[e\X = x) fM\E,x{M\E,X) 



I{E=l) fM\E,xmE = 0,X) 



fE\x{E\X) fM\E,xiM\E,X) 
Thus, our third estimator takes the form 

I{E = 1) /M|j;,x(M|^ = 0,X) 

fEix{E\X) fM\E,x{M\E,X) 



Y- 



At first glance the three estimators Oq^, 9q'^ and might appear to 
be distinct; however, we observe that provided the empirical distribution 
function Fq = Fy\e,m,x ^ -^M|£;,x ^ Ee\x x Ex satisfies the positivity as- 
sumption, and thus Fq G A^nonpan then actually 9q"^ = = 0q™ = 9q{Fo) 
since the three representations agree on the nonparametric model nonpar- 
Therefore we may conclude that these three estimators are in fact asymp- 
totically efficient in nonpar with common infiuence function S'^J^'^"^^^'^ {Oq) . 
Furthermore, from this observation, one further concludes that (asymptotic) 
inferences obtained using one of the three representations are identical to 
inferences using either of the other two representations. 

At this juncture, we note that the above equivalence no longer applies 
when as we have previously argued will likely occur in practice, (M, X) con- 
tains 3 or more continuous variables and/or X is too high dimensional for 
models to be saturated or nonparametric, and thus parametric (or semi- 
parametric) models are specified for dimension reduction. Specifically, for 
such settings, we observe that three distinct modeling strategies are avail- 
able. Under the first strategy, the estimator 0^™'^'*'' is obtained 0q™ using 

parametric model estimates W''^^{Y\E^M,X) and f'^j^^ j^{m\E,X) instead 
of their nonparametric counterparts; similarly under the second strategy, 
the estimator 0q°'''^'^ is obtained similarly to ^q*^ using estimates of para- 
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metric models W^^{Y\E = 1,M = m^X) and /^|^(e|X) and finally, under 

the third strategy, 0^"^'^^^ is obtained similarly to 0q™ using /Jj^(e|X) and 

f^^^^ ^{m\E,X). Then, it follows that Q^^'^^"^ is CAN under the submodel 

A^a, but is generally inconsistent if either EP^''(y |£;, M, X) or ^{m\E,X) 

fails to be consistent. Similarly, O^^'^'^'^ and 0^™'P'^'' are, respectively, CAN 
under the submodels M.h and M-c, but each estimator generally fails to be 
consistent outside of the corresponding submodel. In the next section, we 
propose an approach that produces a triply robust estimator by combining 
the above three strategies so that only one of models Mai -^b and Mc needs 
to be valid for consistency of the estimator. 

2.3. Triply robust estimation. The proposed triply robust estimator ^q'^'''^-^ 
solves 

p^^cfF,nonpar^^triply^^Q^ 

where 5^ff'"°"P'^'-(0) is equal to evaluated at {W'''{Y\E,M,X), 

fl^'^{m\E,X), f^^{e\X)}; that is, 



qtriply 
^0 



(4) 



I{E = l}f^^;'^^^{M\E = 0,X) 
f^^{l\X)f^;^j,,,{M\E = l,X) 

X {Y -EP'''{Y\X,M,E = l)} 
I{E = 0) 



+ ^{EP^^VylX, M,E = 1) 



-^p^'^(i,o,x)}+^p-(i,o,x) 

is CAN in model union = Ma^ Mbl-> Mc, where 
9iP^'{e,e*,X) = j W'''{Y\X,M = m,E = e)f^j^^^{m\E = e\X)d^i{m). 

In the next theorem, the estimator in the above display is combined with 
a doubly robust estimator 5^°'^'^'^ of 5e [see van der Laan and Robins (2003) 
or Tsiatis (2006)], to obtain multiply robust estimators of natural direct and 
indirect effects, where 



^doubly ^ 



{y-^P-(e,e,X)} + ^P-(e,e,X) 



To state the result, we set W'''{Y\X,M,E) = W''''{Y\X,M,E]l3y) = 
{Py h{X , M , E)) , where 51 is a known link function, and /i is a user sped- 
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fied function of {X, M, E) so that W'{Y\X, M, E;^y)= g~^{f3'^h{X, M, E)) 

entails a working regression model for E(y|X, M, i?), and f3y solves the es- 
timating equation 

= ^n[Sy(py)] = ^n[h{X, M, E){Y - g-^ 0^h{X, M, E)))]. 

Similarly, we set fl^^j^^^{m\E, X) = fM\'E,x('^\^^ /^™) •/'m|£;,x("^I-^' 
f3m), a parametric model for the density of [M|ii^,X] with /3m, solving 

d 



= P„[5m(/3„)]=P„ 
for the density of [i^'l^] with fSg solving 

= P„[Se(^e)]=IP^ 



logf^;j,AM\E,X;P„ 



and we set /^j^(e|X) = /^p^(e|X; /3e) for /^p^(e|X; /3e), a parametric model 



|-log/|-(E|X;^,) 

Theorem 2. Suppose that the assumptions of Theorem 1 hold, and that 
the regularity conditions stated in the Appendix hold and that (3m, o,nd j3y 
are variation independent. 

(i) Mediation functional: Then, y/n{9^^'^^^ 
M union with influence function 



9q) is RAL under model 



Qunion / 
'^00 



rieff, nonpar//) n* 

y^o,p 



dp' 



E 



dp 



T 



SpiP* 



and thus converges in distribution to a N^Oj'Eg^), where 

Seo(^o,/5*) = E(5,7°°(0o,/3*)'), 
with (3^ = (/3^,/3j,/3j) andS^iP) = (5^(/3„), 5j(/3e), 5j(/3j,))^, and with (3* 

(00 - So)) is 



denoting the probability limit of the estimator '(3 = 0^^, (3'^ ,'(3T^'^ 



(ii) Natural direct effect: Similarly, y/n{9Q^^^^ 



RAL under model Ai, 



'?doubly 



with influence function S^^^{Oq,5o,I3*) defined 



as S~{eo,l3*) with S^'^,— (0o,5o,/3*) replacing '"^(^0, and 



Oo 



asymptotic variance T,0g_So{Si,(^o, (3*) defined accordingly. 

(iii) Natural indirect effect: Similarly, y/n{5'^°"^^^ — ^q"''^^ — [5\ 

union /'X Q 



- ^o)) is 
defined 



RAL under model Munion with influence function (SJJjg" 
as S~{9o,P*) with 5^'^™''(5i,^o,/3*) replacing 5,^Jf'"™(^o, and 
asymptotic variance Tis-^^eoi^ii^^Oi P*) defined accordingly. 

I- \ /Ttriply ^^triply '^doubly j '^doubly ^triply • j. ■ i 

[IV j Oq ^ , Oq — Oq and o-^ ■' — Oq ^ ■' are semiparametric lo- 
cally efficient in the sense that they are RAL under model union and respec- 
tively achieve the semiparametric efficiency bound for 6q, 6o — 6o, and 6i — Oq 
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under model TWunion o-t the intersection submodel Ma H Mb H Mc, with re- 
spective efficient influence functions: Sg^'^°^^^^ {Oq, f3*), S^^^^'^^^'^ {dQ,5o, f3*) 

and<— (<^i,eo,/5*). 

Empirical versions of /?*) and S5^„0y(5i, /?*) are easily 

obtained, and the corresponding Wald-type confidence intervals can be used 
to make formal inferences about natural direct and indirect effects. It is also 
straightforward to extend the approach to the risk ratio and odds ratio scales 
for binary Y. By a theorem due to Robins and Rotnitzky (2001), part (iv) of 
the theorem implies that when all models are correct, ^q'^'^'^, ^q"^'-^ — g^^^^^y 
and — 0*''^P'y are semiparametric efficient in model nonpar at the 

intersection submodel Ma<^ Mb^i Mc- 

3. A simulation study of estimators of direct effect. In this section, we 
report a simulation study which illustrates the finite sample performance 
of the various estimators described in previous sections. We generated 1000 
samples of size n = 600, 1000 from the following model: 

(Model.X) Xi ~ Bernoulli{OA); [XajXi] ~ Bernoulli{Q2 + 0.4Xi); 

[X^\Xi,X2] ~ -0.024 - 0.4Xi + 0.4X2 + iV(0, 1); 
(Model.E) [S|Xi,X2,X3] ~5ernoM//i([l + exp{-(0.4 + Xi -X2 + 0.1X3- 

1.5X1X3)}]-!); 

(Model.M) [M|^,Xi,X2,X3] ~5ernoii//i([l + exp{-(0.5-Xi +0.5X2 

-0.9X3 + ^-1.5X1X3)}]-!); 
(Model.Y) [Y\M, E, Xi, X2, X3] ~ 1 + 0.2Xi + O.3X2 + I.4X3 

-2.5^ - 3.5M + 5EM + X(0, 1). 

We then evaluated the performance of the following four estimators of the 
natural direct effect - ^o°'''''^ ^0' " ^0''°"'''^ ^0" " ^0''°"''''' and 9^''''''' - 
^doubly ^ ^Y^^^ ^j^g doubly robust estimator -was used throughout 

to estimate 5o = K(Yo). To assess the impact of modeling error, we evaluated 
these estimators in four separate scenarios. In the first scenario, all models 
were correctly specified, whereas the remaining three scenarios respectively 
mis-specified only one of Model E, Model M and Model Y. In order to mis- 
specify Model E and Model M, we respectively left out the X1X3 interaction 
when fitting each model, and we assumed an incorrect log-log link function. 
The incorrect model for Y simply assumed no EM interaction. 

Tables 1 and 2 summarize the simulation results which largely agree 
with the theory developed in the previous sections. Mainly, all proposed 
estimators performed well at both moderate and large sample sizes in the 
absence of modeling error. Furthermore, under the partially mis-specified 
model in which Model.Y was incorrect, both estimators, ^q'^ — ^^^"^^^^ and 
— (5,^°'^'''^, showed significant bias irrespective of sample size, while ^g™ ~ 
jdoubiy 0*^'P'y — ^doubly ^^^^ performed well. Similarly when Model M 
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Table 1 
Simulation results n — 600 













•A'tunion 


All correct 


bias 


0.002 


0.008 


0.002 


0.005 




MC s.e* 


0.005 


0.007 


0.006 


0.006 


Y wrong 


bias 


-0.500 


-0.500 


0.0001 


0.004 




MC s.e. 


0.005 


0.006 


0.006 


0.006 


M wrong 


bias 


0.038 


0.008 


-0.054 


0.003 




MC s.e. 


0.005 


0.007 


0.006 


0.006 


E wrong 


bias 


0.003 


0.027 


0.059 


0.004 




MC s.e. 


0.005 


0.005 


0.005 


0.005 




_Jdoubly. g-yc 


_ -cdoubly. A. . 


n'cm 'jdoubly 

^0 — ^0 


5 Lunion ■ 


'c'doubly 

— Oq 



* Monte Carlo standard error. 

was incorrect, the estimators ^q"^ — ,5^°"^'^ and 9q^ — resulted in 

large bias, when compared to the relatively small bias of 6^'^ — 
and ^q"'^^'^ — ^ij^""*^'^. Finally, mis-specifying Model E lead to estimators 
6q'^ — and 0q™ — ,5^°'^'^'^ that were significantly more biased than the 

estimators 0q ™ — (5^°^*^^^ and 6^'^^^^-^ — ^^°"^^y , Interestingly, the efficiency loss 
of the multiply robust estimator remained relatively small when compared 
to the consistent nonrobust estimator under the various scenarios, suggest- 
ing that, at least in this simulation study, the benefits of robustness appear 
to outweigh the loss of efficiency. 

4. A data application. In this section, we illustrate the methods in a real 
world application from the psychology literature on mediation. We re-analyze 
data from The Job Search Intervention Study (JOBS II) also analyzed by 
Imai, Keele and Tingley (2010). JOBS II is a randomized field experiment 

Table 2 
Simulation results n — 1000 







Alym 


Alye 


A4em 




All correct 


bias 


0.001 


0.009 


0.001 


0.001 




MC s.e.* 


0.004 


0.005 


0.004 


0.004 


Y wrong 


bias 


-0.484 


-0.484 


0.003 


0.003 




MC s.e. 


0.004 


0.004 


0.004 


0.004 


M wrong 


bias 


0.136 


-0.008 


0.056 


0.01 




MC s.e. 


0.004 


0.05 


0.004 


0.01 


E wrong 


bias 


0.001 


-0.024 


-0.054 


0.001 




MC s.e. 


0.004 


0.004 


0.004 


0.004 


Mym- 


-?o^°""^; My,: C 


-5o''°""^; A4o,„: 


/Tcm ^doubly 


. \/l ■ fltriply 
5 ^*^lunion • c/q 


'c^doubly 

-So 



'Monte Carlo standard error. 
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Table 3 

Estimated causal effects of interest using the job search intervention study data 







Alym 


A4ye 




•A'^lunion 


Direct effect 


Estimate 


-0.0310 


-0.0310 


0.0280 


-0.0409 




s.e.* 


0.0124 


0.0620 


0.0465 


0.0217 


Indirect effect 


Estimate 


-0.0160 


-0.0160 


-0.0750 


-0.0070 




s.e.* 


0.0372 


0.0620 


0.0434 


0.0217 


* Nonparametric 


bootstrap standard 


errors. 









that investigates the efficacy of a job training intervention on unemployed 
workers. The program is designed not only to increase reemployment among 
the unemployed but also to enhance the mental health of the job seekers. 
In the study, 1801 unemployed workers received a pre-screening question- 
naire and were then randomly assigned to treatment and control groups. 
The treatment group with E = 1 participated in job skills workshops in 
which participants learned job search skills and coping strategies for deal- 
ing with setbacks in the job search process. The control group with = 
received a booklet describing job search tips. An analysis considers a con- 
tinuous outcome measure yof depressive symptoms based on the Hopkins 
Symptom Checklist [Imai, Keele and Tingley (2010)]. In the JOBS II data, 
a continuous measure of job search self-efficacy represented the hypothe- 
sized mediating variable M. The data also included baseline covariates X 
measured before administering the treatment including: pretreatment level 
of depression, education, income, race, marital status, age, sex, previous 
occupation, and the level of economic hardship. 

Note that by randomization, the density of was known by design not 

to depend on covariates, and therefore its estimation is not prone to modeling 
error. The continuous outcome and mediator variables were modeled using 
linear regression models with Gaussian error, with main effects for [E, M, X) 
included in the outcome regression and main effects for {E,X) included in 
the mediator regression. Table 3 summarizes results obtained using ^q™, 
0Q°, ^q"^ and Oq'^^^^-^ together with Se"'^^^^ , e = 0,1, to estimate the direct 
and indirect effects of the treatment. 

Point estimates of both natural direct and indirect effects closely agreed 
under models A^ym and Aiye, and also agreed with the results of Imai, 
Keele and Tingley (2010). We should note that inferences under our choice of 
Mym are actually robust to the normality assumption and, as in Imai, Keele 
and Tingley (2010), only require that the mean structure of M, X] 

and [MlE'jX] is correct. In contrast, inferences under model Aicm require 
a correct model for the mediator density. This distinction may partly explain 
the apparent disagreement in the estimated direct effect under Mem when 
compared to the other methods, also suggesting that the Gaussian error 
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model for M is not entirely appropriate. The multiply robust estimate of the 
natural direct effect is consistent with estimates obtained under models Mym 
and A^ye, and is statistically significant, suggesting that the intervention 
may have beneficial direct effects on participants' mental health; while the 
multiply robust approach suggests a much smaller indirect effect than all 
other estimators although none achieved statistical significance. 

5. Improving the stability of 0^^^^^ when weights are highly variable. 

The triply robust estimator 0^^^^^ which involves inverse probability weights 
for the exposure and mediator variables, clearly relies on the positivity as- 
sumption, for good finite sample performance. But as recently shown by 
Kang and Schafer (2007) in the context of missing outcome data, a practical 
violation of positivity in data analysis can severely compromise inferences 
based on such methodology; although their analysis did not directly concern 
the M- functional ^o- Thus, it is crucial to critically examine, as we do below 
in a simulation study, the extent to which the various estimators discussed 
in this paper are susceptible to a practical violation of the positivity as- 
sumption, and to consider possible approaches to improve the finite sample 
performance of these estimators in the context of highly variable empirical 
weights. Methodology to enhance the finite sample behavior of ^'^^^^^'^ ig well 
studied in the literature and is not considered here; see, for example, Robins 
et al. (2007), Cao, Tsiatis and Davidian (2009) and Tan (2010). We first 
describe an approach to enhance the finite sample performance of 0q"^^^, 
particularly in the presence of highly variable empirical weights. To focus 
the exposition, we only consider the case of a continuous Y and a binary M, 
but in principle, the approach could be generalized to a more general setting. 
The proposed enhancement involves two modifications. 

The first modification adapts to the mediation context, an approach de- 
veloped for the missing data context (and for the estimation of total effects) 
in Robins et al. (2007). The basic guiding principle of the approach is to 
carefully modify the estimation of the outcome and mediator models in or- 
der to ensure that the triply robust estimator given by equation (4) has the 
simple M-functional representation 

^*"P'^'^ = P„{^P-'t(l,0,X)}, 

where r/P'^'''^(l, 0, X) is carefully estimated to ensure multiple robustness. 
The reason for favoring an estimator with the above representation is that 
it is expected to be more robust to practical positivity violation because it 
does not directly depend on inverse probability weights. However, as we show 
next, to ensure multiple robustness, estimation of rf^"^ involves inverse prob- 
ability weights, and therefore, ^q"^^-^'^ indirectly depends on such weights. 
Our strategy involves a second step to minimize the potential impact of this 
indirect dependence on weights. 
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In the following, we assume, to simplify the exposition, that a simple 
linear model is used: 

^p^^{Y\X,M,E = l) = ¥P'''{Y\X, M,l;(3y) = [l,X^,M](3y. 

Then, similar to Robins et al. (2007), one can verify that the above M- 
functional representation of a triply robust estimator is obtained by estimat- 
ing /P.^'^^(M|S = 0,X) with f^^l^^^^{M\E = 0,X) obtained via weighted 

logistic regression in the unexposed-only, with weight /^j^(0|X)~^; and by 

estimating EP'''{Y\X,M,E = 1) using weighted OLS of Y on {M,X) in the 
exposed-only, with weight 

fi;;^\^{M\E = 0,X){f^^{l\X)f^^^^^^^ 

provided that both working models include an intercept. The second en- 
hancement to minimize undue influence of variable weights on the M-func- 
tional estimator, entails using f^^^^ in the previous step instead of /^|^, 
where 

logit f^^\l\X) = logit + Ci 

with 

Ci = -log(l -P„(£;)) + log(P„[i?/|p^(0|X)//Jp^(l|X)]). 

This second modification ensures a certain boundedness property of in- 
verse propensity score-weighting. Specifically, for any bounded function R = 
r{Y, M) of Y and M; consider for a moment the goal of estimating the coun- 
terfactual mean E{r(Yi,Mi)}; then it is well known that even though R is 
bounded, the simple inverse-probability weighting estimator P„{£^i2/^j^(l| 
could easily be unbounded, particularly if positivity is practically 
violated. In contrast, as we show next, the estimator P„{ii^i?/^j^'^(l|X)~^} 
is generally bounded. To see why, note that 

P„{i?i?/||:jt(l|X)-i} = P„{i^i2/Pp^'t(0|X)/Pf^'t(l|X)"n + P„{i?} 

R ^ ^ 1 - P„(^) 

which is bounded since the second term is bounded, and the first term is 
a convex combination of bounded variables, and therefore is also bounded. 
Furthermore, Pn[-E/^|^'^(0|X)/^j^'^(l|X)~^] converges in probability to (1 — 

E(i?)) provided that /^^^ converges to fE\x^ ensuring that the expression in 
the above display is consistent for E{r(Yi,Mi)}. The nonparametric boot- 
strap is most convenient for inference using f^^- 
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In the next section, we study, in the context of highly variable weights, 
the behavior of our previous estimators of together with that of the 
enhanced estimators ^0"^^^'^'^' = P„{^P'''''t'J(l, 0, X)}, j = 1,2, where ^P^=-'t.i 
is constructed as described above using f^^^, and 7/P'^'''t'2 ^ggg f^^^K 

6. A simulation study where positivity is practically violated. We adapt- 
ed to the mediation setting, the missing data simulation scenarios in Kang 
and Schafer (2007) which were specifically designed so that, when misspeci- 
fied, working models are nonetheless nearly correct, but yield highly variable 
inverse probability weights with practical positivity violation in the context 
of estimation. We generated 1000 samples of size n = 200, 1000 from the 
following model: 

(Model.X) Z = Zi,Z2,Z3,Z4''~'Af(0,l);Xi =exp(Zi/2); 

X2 = Z2/{1 + exp(Zi)} + 10; X3 = (Z1Z3/25 + 0.6)3 

and X4 = {Z2 + Zi + 20)2, so that Z may be expressed in terms 

oiX. 

(Model.E) [£;|Xi,X2,X3] ~5ernoM//i([l + exp{(Zi -O.5Z2 +O.25Z3 
+0.1Z4)}]-!); 

(Model.M) \M\E,Xi,X2,X3] ~ 5ernoii//i([l + exp{-(0.5 - Zi + O.5Z2 

-0.9Z3 + Z4-1.5£;)}]-i); 
(Model.Y) [Y\M,E,Xi,X2,X3] ~ 210 + 27.4Zi + 13.7^3 + 13.7^3 

+M + E + N(S),l). 

Correctly specified working models were thus achieved when an additive 
linear regression of Y on Z, a logistic regression of M with linear predictor 
additive in Z and E and a logistic regression of E with linear predictor 
additive in the Z, respectively. Incorrect specification involved fitting these 
models with X replacing Z, which produces higly variable weights. For in- 
stance, an estimated propensity score as small as 5.5 x 10"^^ occurred in 
the simulation study reflecting an effective violation of positivity; similarly, 
a mediator predicted probability as small as 3 x 10^^*^ also occured in the 
simulation study. 

(■triply 



Tables 4 and 5 summarize simulation results for 9q , ^q'^, ^0™) 



'0 



gtripiy,t,i ^triply, t, 2^ When all three working models are correct, all es- 
timators perform well in terms of bias, but there are clear differences be- 
tween the estimators in terms of efficiency. In fact, ^q™, ^q'^'''^^, Q^'^^p^^'^'^ 
and ^Q^'^P'y'''''^ have comparable efficiency for n = 200, 1000, but 0q°, 0q™ 
is far more variable. Moreover, under mis-specification of a single model, 
^ triply^ ^ triply, t,i ^tripiy,t,2 pgj^^jj^ nearly Unbiased, and for the most part 

substantially more efficient than the corresponding consistent estimator in 
{^Q™, 9q^ , ^0™}. When at least two models are mis-specified, the multi- 
ply robust estimators Sq'^'p'^, 0*"?'^'''''^ and ^^''^p^^'^'^ generally outperform 
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Table 4 
Simulation results n — 200 













KA 


union 


union 


All correct 


bias 


0.001 


-0.207 


0.498 


0.003 


-0.08 


-0.079 




MC s.e.* 


2.614 


8.333 


20.214 


2.6151 


2.6155 


2.6153 


Y wrong 


bias 


-9.87 


-10.221 


0.498 


-0.147 


-0.502 


-0.202 




MC s.e. 


3.322 


10.539 


20.214 


4.461 


3.177 


3.141 


M wrong 


bias 


-0.033 


-0.207 


-9.497 


0.001 


0.046 


0.046 




MC s.e. 


2.613 


8.333 


15.376 


2.615 


2.614 


2.614 


E wrong 


bias 


-0.001 


0.132 


210.450 


0.066 


-0.089 


-0.087 




MC s.e. 


2.614 


4.373 


2336.92 


4.891 


2.619 


2.615 


y, E wrong 


bias 


-9.869 


-13.535 


210.454 


-33.090 


-1.4609 


-2.487 




MC s.e. 


3.322 


5.256 


2336.92 


375.334 


5.187 


4.245 


Y, M wrong 


bias 


-9.355 


-10.220 


-9.496 


-4.346 


-3.579 


-3.579 




MC s.e. 


3.224 


10.539 


15.376 


3.912 


3.480 


3.441 


E, M wrong 


bias 


-0.032 


0.132 


205.060 


0.088 


-0.001 


-3.77 xlO"'^ 




MC s.e. 


2.614 


4.373 


2289.788 


4.763 


2.623 


2.618 


Y,E,M wrong 


bias 


-9.355 


-13.535 


205.060 


-37.757 


-4.223 


-5.253 




MC s.e. 


3.224 


5.356 


2289.78 


379.122 


5.835 


4.828 






. Jem. . . fltriply. A^t.l . fltriply, t , 1 . Ay/t,2 . 2'triply,t,2 
=m. fo . union. , ^^1^^;^,^. t/p , JVl^^^^^. t/p 



'Monte Carlo standard error. 



Table 5 
Simulation results n — 1000 









Alym 






•A'^lunion 


• union 


' union 


All correct 


bias 


0, 


,0324 


0.004- 


-0.106 


0.034 


-0.047 


-0.047 




MC s.e* 


1, 


,136 


3.06 


6.490 


1.136 


1.137 


1.137 


Y wrong 


bias — 


10, 


,256 


-10.305- 


-0.106 


0.063 


-0.147 


-0.148 




MC s.e. 


1, 


,675 


4.005 


6.490 


1.769 


1.419 


1.407 


M wrong 


bias 


-5 


X 10"^ 


0.004- 


-9.706 


0.033 


0.076 


0.076 




MC s.e. 


1, 


,136 


3.060 


5.395 


1.137 


1.137 


1.135 


E wrong 


bias 


0, 


,032 


0.135 


2.4 X 10" 


1908.76 


-0.038 


-0.030 




MC s.e. 


1, 


,136 


1.794 


4.3 X 10'' 


53911.63 


1.400 


1.242 


Y, E wrong 


bias — 


10, 


,256 


-14.011 


2.4 X 10" 


-1.1 X 10" 


6.201 


1.024 




MC s.e. 


1, 


,675 


2.386 


4.3 X 10' 


2.1 X 10' 


9.406 


5.097 


Y, M wrong 


bias 


-9, 


,705 


-10.305- 


-9.706 


-4.216 


-3.555 


-3.557 




MC s.e. 


1, 


,626 


4.004 


5.395 


1.667 


1.527 


1.510 


E, M wrong 


bias 


5, 


,7 X 10"-* 


0.135 


2.5 X 10" 


2034.83 


0.0539 


0.0599 




MC s.e. 


1, 


,136 


1.794 


4.6 X 10' 


56090.10 


1.429 


1.272 


Y, E, M wrong bias 


-9, 


,075 


-14.011 


2.5 X 10" 


-1.2 X 10" 


4.659 


-0.755 




MC s.e. 


1, 


,626 


2.386 


4.6 X 10' 


2.2 X 10' 


10.121 


5.910 




yc: M 


cm ' 


: ^o"-"; M 


union. I7q , J^l^^i^n- I7g , J^l^^i^^- ^ 


■triply, t, 2 




* Monte Carlo standard error. 
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the other estimators, although ^q^^'^ ^ occasionally succumbs to the unsta- 
ble weights resulting in disastrous mean squared error; see Table 5 when 
Model M and Model E are both incorrect. In contrast, ^^''^p^^''^'^ generally 
improves on 0^^^^^^'^'^ which generally outperforms ^q'^^'^'^ and for the most 
part ^Q^'P'^'"'''^ and 0^^''^^^'^''^ appear to eliminate any possible deleterious im- 
pact of highly variable weights. 

7. A comparison to some existing estimators. In this section, we briefly 
compare the proposed approach to some existing estimators in the literature. 
Perhaps the most common approach for estimating direct and indirect effects 
when Y is continuous uses a system of linear structural equations; whereby, 
a linear structural equation for the outcome, given the exposure, the medi- 
ator and the confounders, is combined with a linear structural equation for 
the mediator, given the exposure and confounders, to produce an estima- 
tor of natural direct and indirect effects. The classical approach of Baron 
and Kenny (1986) is a particular instance of this approach. In recent work, 
mainly motivated by Pearl's mediation functional, several authors [Imai, 
Keele and Tingley (2010), Imai, Keele and Yamamoto (2010), Pearl (2011), 
VanderWeele (2009), Vanderweele and Vansteelandt (2010)] have demon- 
strated how the simple linear structural equation approach generalizes to 
accommodate both, the presence of an interaction between exposure and 
mediator variables, and a nonlinear link function, either in the regression 
model for the outcome, or in the regression model for the mediator, or both. 
In fact, when the effect of confounders is also modeled in such structural 
equations, inferences based on the latter can be viewed as special instances 
of inferences obtained under a particular specification of model Ma for the 
outcome and the mediator densities. And thus, as previously shown in the 
simulations, an estimator obtained under a system of structural equations 
will generally fail to produce a consistent estimator of natural direct and 
indirect effects when model A4a is incorrect, whereas, by using the proposed 
multiply robust estimator, valid inferences can be recovered under the union 
model Mb U Mc, even if Ma fails. 

A notable improvement on the system of structural equations approach 
is the double robust estimator of a natural direct effect due to van der 
Laan and Petersen (2005). Their estimator solves the estimating equation 

constructed using an empirical version of S^j^^^^^^^^{6o,6o) given in the 
online Appendix. They show their estimator remains CAN in the larger sub- 
model Ma U Mc and therefore, they can recover valid inferences even when 
the outcome model is incorrect, provided both the exposure and mediator 
models are correct. Unfortunately, the van der Laan estimator is still not 
entirely satisfactory because unlike the proposed multiply robust estimator, 
it requires that the model for the mediator density is correct. Nonetheless, 
if the mediator model is correct, the authors establish that their estimator 
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achieves the efficiency bound for model Ma UTWc at the intersection sub- 
model Ma n Mc where all models are correct; and thus it is locally semi- 
parametric efficient in Ma^ Mc- Interestingly, as we report in the online 
supplement, the semiparametric efficiency bounds for models Ma^Mc and 
Ma^ Mb'J Mc are distinct, because the density of the mediator variable 
is not ancillary for inferences about the M-functional. Thus, any restriction 
placed on the mediator's conditional density can, when correct, produce im- 
provements in efficiency. This is in stark contrast with the role played by the 
density of the exposure variable, which as in the estimation of the marginal 
causal effect, remains ancillary for inferences about the M-functional and 
thus the efficiency bound for the latter is unaltered by any additional infor- 
mation on the former [Robins, Rotnitzky and Zhao (1994)]. In the online 
Appendix, we provide a general functional map that relates the efficient in- 
fluence function for the larger model Ma^Mb^Mc to the efficient influence 
for the smaller model Ma^ Mc where the model for the mediator is either 
parametric or semiparametric. Our map is instructive because it makes ex- 
plicit using simple geometric arguments, the information that is gained from 
increasing restrictions on the law of the mediator. In the online Appendix, 
we illustrate the map by recovering the efficient influence function of van 
der Laan and Petersen in the case of a singleton model (i.e., a known con- 
ditional density) for the mediator and in the case of a parametric model for 
the mediator. 

8. A semiparametric sensitivity analysis. We describe a semiparametric 
sensitivity analysis framework to assess the extent to which a violation of 
the ignorability assumption for the mediator might alter inferences about 
natural direct and indirect effects. Although only results for the natural 
direct effect are given here, the extension for the indirect effect is easily 
deduced from the presentation. Let 

t(e, m, x) = E[Yi^rn\E = e,M = m,X = x]- K[Yi^ra\E = e, M ^ m,X = x], 

then 

Yc',rr.AM\E = e,X, 

that is, a violation of the ignorability assumption for the mediator variable, 
generally implies that 

t{e,m,x)^0 for some (e,m,a:;). 

Thus, we proceed as in Robins, Rotnitzky and Scharfstein (2000), and pro- 
pose to recover inferences by assuming the selection bias function t{e,m,x) 
is known, which encodes the magnitude and direction of the unmeasured 
confounding for the mediator. In the following, the support of M, S is as- 
sumed to be finite. To motivate the proposed approach, suppose for the 
moment that x(M|£', X) is known; then under the assumption that 
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the exposure is ignorable given X, we show in the Appendix that 
E[Yi^jMo = m,X = x] 

= E[Yi^rn\E = 0, M = m,X = x] 

= E[Y\E = 1,M = m,X = x]-t{l,m,x){l - fM\E,xim\E = l,X = x)) 
+ t{0,m,x){l - fM\E,xHE = 0,X = x)), 
and therefore the M-functional is identified by 

Y,nnY\E = l,M = m,X]- t{l,m,X){l - fMiE,xME = hX)) 

(5) + t{0, m,X){l- fM\E,x{^\E = 0' ^))} 

X fM\E,x{m\E = 0,X), 
which is equivalently represented as 

I{E = l}fM\E,xmE = 0,X) 



E 



(6) 



,fE\xm)fM\E,xmE=i,x) 

x{Y- t{l, M, X){1 - fM\E,xME = 1' ^)) 
+ t{0, M,X){1 - fM\E,xmE = 0,X))} 



Below, these two equivalent representations, (5) and (6), are carefully com- 
bined to obtain a double robust estimator of the M-functional, assuming 
is known. A sensitivity analysis is then obtained by repeating this 
process and reporting inferences for each choice of t {■,■,■) in a finite set of 
user-specified functions T={ tx{-,-,-) : X} indexed by a finite dimensional 
parameter A with to{-, •, •) G T corresponding to the unmeasured confounding 
assumption, that is, to('; "j •) = 0. Throughout, the model f^^^^ ^"^^ 
for the probability mass function of M is assumed to be correct. Thus, to 
implement the sensitivity analysis, we develop a semiparametric estimator 
of the natural direct effect in the union model Ma U Mc, assuming t{-, ■, •) 
=tx*{-, •, •) for a fixed A*. The proposed doubly robust estimator of the nat 
ural direct effect is then given by 6q"^^^^{X* 
previously described, and 



^doubly 1 'pdoubly 

Oq where Oq is as 



5'doubly 
^0 



(A*)=P. 



I{E = l}f^,;^^^^{M\E = 0,X) 
x{Y- iP^''(^l^> M, ^ = 1)} + v^^'il, 0, X- A*) 
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with 



{EP-(y |X, M = m,E = l) + tx* (0, m, - /^^^^^^(mlE = 0, X)) 

mS<S 

-tx^{l,m,X){l-fl^;^^^^im\E = l,X))} 



x/M|kxH^ = 0,X). 

Our sensitivity analysis then entails reporting the set {0q°'^'^'^(A) — 
jdoubiy . ^1 ^g^^^ ^Yie associated confidence intervals), which summarizes how 
sensitive inferences are to a deviation from the ignorability assumption A = 0. 
A theoretical justification for the approach is given by the following formal 
result, which is proved in the supplemental Appendix. 

Theorem 4. Suppose t{-,-,-) = tx*{-,-,-); then under the consistency, 
positivity assumptions and the ignorability assumption for the exposure, 
^ doubly -J _ ^doubly ^ CAN estimator of the natural direct effect in 

The influence function of 9q°^^^^ {X*) is provided in the Appendix, and 
can be used to construct a corresponding confidence interval. 

It is important to note that the sensitivity analysis technique presented 
here differs in crucial ways from previous techniques developed by Hafe- 
man (2008), VanderWeele (2010) and Imai, Keele and Yamamoto (2010). 
First, the methodology of VanderWeele (2010) postulates the existence of 
an unmeasured confounder U (possibly vector valued) which, when included 
in X, recovers the sequential ignorability assumption. The sensitivity analy- 
sis then requires specification of a sensitivity parameter encoding the effect of 
the unmeasured confounder on the outcome within levels of {E,X,M), and 
another parameter for the effect of the exposure on the density of the unmea- 
sured confounder given {X,M). This is a daunting task which renders the 
approach generally impractical, except perhaps in the simple setting where 
it is reasonable to postulate a single binary confounder is unobserved, and 
one is willing to make further simplifying assumptions about the required 
sensitivity parameters [VanderWeele (2010)]. In comparison, the proposed 
approach circumvents this difficulty by concisely encoding a violation of the 
ignorability assumption for the mediator through the selection bias function 
t\{e, m, x). Thus the approach makes no reference and thus is agnostic about 
the existence, dimension and nature of unmeasured confounders U. Further- 
more, in our proposal, the ignorability violation can arise due to an unmea- 
sured confounder of the mediator-outcome relationship that is also an effect 
of the exposure variable, a setting not handled by the technique of Vander- 
Weele (2010). The method of Hafeman (2008) which is restricted to binary 
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data, shares some of the limitations given above. Finally, in contrast with our 
proposed double robust approach, a coherent implementation of the sensi- 
tivity analysis techniques of Imai, Keele and Yamamoto (2010), Imai, Keele 
and Tingley (2010) and VanderWeele (2010) rely on correct specification of 
all posited models. We refer the reader to VanderWeele (2010) for further 
discussion of Hafeman (2008) and Imai, Keele and Yamamoto (2010). 

9. Discussion. The main contribution of the current paper is a theoreti- 
cally rigorous yet practically relevant semiparametric framework for making 
inferences about natural direct and indirect causal effects in the presence 
of a large number of confounding factors. Semiparametric efficiency bounds 
are given for the nonparametric model, and multiply robust locally efficient 
estimators are developed that can be used when nonparametric estimation 
is not possible. 

Although the paper focuses on a binary exposure, we note that the ex- 
tension to a polytomous exposure is trivial. In future work, we shall extend 
our results for marginal effects by considering conditional natural direct and 
indirect effects, given a subset of pre-exposure variables [Tchetgen Tchetgen 
and Shpitser (2011)]. These models are particularly important in making 
inferences about so-called moderated mediation effects, a topic of growing 
interest, particularly in the field of psychology [Preacher, Rucker and Hayes 
(2007)]. In related work, we have recently extended our results to a survival 
analysis setting [Tchetgen Tchetgen (2011)]. 

A major limitation of the current paper is that it assumes that the me- 
diator is measured without error, an assumption that may be unrealistic 
in practice and, if incorrect, may result in biased inferences about medi- 
ated effects. We note that much of the recent literature on causal mediation 
analysis makes a similar assumption. In future work, it will be important to 
build on the results derived in the current paper to appropriately account 
for a mis-measured mediator [Tchetgen Tchetgen and Lin (2012)]. 



Proof of Theorem 1. Let Fo-t = FY\M,x,E;tFM\E,X;tFE\X;tFx;t de- 
note a one-dimensional regular parametric submodel of 7W nonpar > with Fofi = 
Fq, and let 
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The efficient infiuence function S^^ 
to satisfy the following equation: 

Vt=o^t = E{5; 




rcflf, nonpar 
'do 
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for U the score of Fp-t at t = 0, and Vj=o denoting differentiation w.r.t. t at 
t = 0. We observe that 
dOt 



dt 



t=o 



Vt=oEt{Y\E = l,M = m,X = x) 
SxX 

X fM\E,x{m\E = 0,X = x)fx{x) dfi{m,x) 
+ j j ¥.{Y\E = l,M = m,X = x) 

X Vt=ofM\E,X;t{'^\E = 0, X = x)fx{x) di^{m, x) 



+ JJ E{Y\E = l,M = m,X = x) 
SxX 

X fM\E,x{m\E = 0,X = x)Vt=ofx;t{x)d^i{m,x). 
Considering the first term, it is straightforward to verify that 

Vt=oMY\E = l,M = m,X = x)fM\E,xim\E = 0,X = x)fx{x) dfii 



m,x) 



SxX 



IiE = l) fM\ExiM\E = 0,X) 



fE\x{E\X) 



fMlEMM\E=l,X) 



Similarly, one can easily verify that 

E{Y\E = l,M = m,X = x)Vt=ofM\E,X;tME = 0,X = x)fx{x) dfi{m, x) 



SxX 



■E 



U !^^,„^L {E{Y\E = l,M = m,X = x) -r]{l,0,X)} 



fE\x{E\X 
and finally, one can also verify that 

E{Y\E = l,M = m,X = x)fMiE,xim\E = 0,X = x)Vt=o/x;t(x) dfi{m, x) 



SxX 

= E[u{7]{i,o,x)-eo}]. 

Thus we obtain 

Given S'^^'^"^^^'^ {6e), the results for the direct and indirect effect follow from 
the fact that the influence function of a difference of two functionals equals 
the difference of the respective influence functions. Because the model is 
nonparametric, there is a unique influence function for each functional, and 
it is efficient in the model, leading to the efficiency bound results. □ 
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Proof of Theorem 2. We begin by showing that 

-<cfT, nonpar , 



(7) 



= 



under model A^union- First note that {/3*,I3^) = {I3y,l3m) under model Ala- 
Equality (7) now follows because EP^-'^ylX, M,E = 1; (3y) = E{Y\X, M,E = 
1) and r/(l, 0, X; py,p^) = E[{EP-(y|X, M,E = l- l5y)}\E = 0, X] = r?(l, 0, X): 



I{E = l}f^j^^^{M\E = 0,X; pm) 
/Pf^(l|X;/3*)/P,^^^^^(M|£; = l,X;/3^) 

=0 



+ E 



X E{y - W''{Y\X, M, E = l- (3y)\E = 1, M, X} 
I{E = 0) 



/Pp^(l|X;/3*^ 



=0 



xE[{&^'{Y\X,M,E = l;/3y)-r]{l,0,X;/3y,f3m)}\E = 0,X] 

+ E[r]{l,0,X;Py,/3^)]-eo 
= 0. 

Second, {/3*,/3*) = {I3y,/3e) under model Mb- Equality (7) now follows be- 
cause EP'''{Y\X,M,E = l;Py) = E{Y\X,M,E = 1) and /3e) = 

fE\xmy. 



:E 



I{E=l}fl^j^_^{M\E = 0,X;f3*^) 
f^^il\X;(3,)fl^;^j^^^iM\E = l,X;f3^) 



+ E 



X E{Y -EP'''{Y\X,M,E = l;(3y)\E=l,M,X} 
I{E = 0) 



/iri(i|^;/3e) 

X E[{EP-(>^|^, M, E = l- Py) - r/(l, 0, X; Py,fil,)]\E = 0, X] 
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+ E[r?(l,0,X;/3„O-^0 
= E[E[{EP'''{Y\X, M,E = 1; ^y)}\E = 0, X]] -00 = 0. 

Third, equality (7) holds under model Aic because 

E{Sf— (0o;/3^,/3e,/3:)} 



. par 

-f 1 — J- r./ 



I{E = Ijfl^l'j^^^iMlE = 0, X; ^m) 



+ E 



/|p^(l|X;/3e)/P,1^^,,(M|i? = l,X;A 
X E{Y - W'''{Y\X, M, E = l; 
I{E = 0) 



X E[{EP^\Y\X, M,E = 1;P;)- rj{l, 0, X; /3;,Pm)}\E = 0, X] 

= E[E[{E{Y\X,M,E = 1)}\E = 0,X]] 

- E[E[EP^'^(y|X, M,E = 1;(3*)\E = 0,X]] 

+ E[E[EP^'{Y\X, M,E = l;p;)\E = 0,X]]- E[r,{l, 0, X; 

+ E[r^{l,0,X;^;,(3m)]-eo 

= E[E[{E{Y\X,M,E = l)}\E = 0,X]]-eo. 

Assuming that the regularity conditions of Theorem lA in Robins, Mark 
and Newey (1992) hold for Sg^'^°^^^^ {Oq; Pm, Pe, Py), S/s{f3), the expression 
for Sg^^°^{9o, P*) follows by standard Taylor expansion arguments, and it 
now follows that 

n 

(8) v^(C''" - ^o) = ^ E ^d°°(^o, n + op(i). 

i=l 

The asymptotic distribution of ^/n{9Q^^'^^^ — 6q) under model A^union follows 
from the previous equation by Slutsky's Theorem and the Central Limit 
Theorem. 

We note that ig CAN in the union model union since it is CAN 

in the larger model where either the density for the exposure is correct, or 
the density of the mediator and the outcome regression are both correct 
and thus ?7(e, e, X; /3*, = E{Y\X,E = e). This gives the multiply robust 
result for direct and indirect effects. The asymptotic distribution of direct 
and indirect effect estimates then follows from similar arguments as above. 
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At the intersection submodel 



dE{Sl 



leff, nonpar 
'do 



= 



hence 



5r°"(^o,/3) = s; 



,eff, nonpar 
'00 



{00, P)- 



The semiparametric efficiency claim then follows for 0q"^^, and a similar 
argument gives the result for direct and indirect effects. □ 

Proofs of Theorems 3 and 4. The proofs are given in the online 
Appendix. □ 
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Supplemental Appendix to Semiparametric theory for causal mediation 
analysis (DOT: 10.1214/12-AOS990SUPP; .pdf). The supplementary mate- 
rial gives the semiparametric efficiency theory for estimation of natural di- 
rect effects with a known model for the mediator density. The Appendix also 
gives the proof of Theorem 3 (stated in the Supplementary Appendix) and 
of Theorem 4. 
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